Importing all necessary libraries¶
In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
from scipy.stats import pearsonr
pio.renderers.default = 'notebook'
Loading the Data¶
In [2]:
pesticides=pd.read_csv('pesticides.csv')
yield_df=pd.read_csv('yield_df.csv')
A quick look at the data sets¶
This data set contains 4349 entries with 7 columns¶
In [3]:
pesticides.sample(10)
Out[3]:
| Domain | Area | Element | Item | Year | Unit | Value | |
|---|---|---|---|---|---|---|---|
| 2842 | Pesticides Use | Nicaragua | Use | Pesticides (total) | 1996 | tonnes of active ingredients | 2041.70 |
| 3283 | Pesticides Use | Russian Federation | Use | Pesticides (total) | 2009 | tonnes of active ingredients | 25961.00 |
| 2858 | Pesticides Use | Nicaragua | Use | Pesticides (total) | 2012 | tonnes of active ingredients | 4413.82 |
| 1181 | Pesticides Use | Denmark | Use | Pesticides (total) | 1994 | tonnes of active ingredients | 3922.00 |
| 347 | Pesticides Use | Belarus | Use | Pesticides (total) | 1992 | tonnes of active ingredients | 8306.31 |
| 3518 | Pesticides Use | Slovenia | Use | Pesticides (total) | 1992 | tonnes of active ingredients | 1309.00 |
| 2424 | Pesticides Use | Malaysia | Use | Pesticides (total) | 1994 | tonnes of active ingredients | 39406.48 |
| 3500 | Pesticides Use | Slovakia | Use | Pesticides (total) | 1999 | tonnes of active ingredients | 2831.90 |
| 560 | Pesticides Use | Brazil | Use | Pesticides (total) | 2016 | tonnes of active ingredients | 377176.00 |
| 458 | Pesticides Use | Bhutan | Use | Pesticides (total) | 1995 | tonnes of active ingredients | 15.70 |
In [4]:
pesticides.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4349 entries, 0 to 4348 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain 4349 non-null object 1 Area 4349 non-null object 2 Element 4349 non-null object 3 Item 4349 non-null object 4 Year 4349 non-null int64 5 Unit 4349 non-null object 6 Value 4349 non-null float64 dtypes: float64(1), int64(1), object(5) memory usage: 238.0+ KB
This Data set contains 28242 entries with 7 columns¶
In [5]:
yield_df.sample(10)
Out[5]:
| Unnamed: 0 | Area | Item | Year | hg/ha_yield | average_rain_fall_mm_per_year | pesticides_tonnes | avg_temp | |
|---|---|---|---|---|---|---|---|---|
| 21201 | 21201 | Netherlands | Wheat | 2007 | 72714 | 778.0 | 12073.15 | 11.04 |
| 13438 | 13438 | India | Soybeans | 2007 | 12351 | 1083.0 | 27422.77 | 26.27 |
| 16553 | 16553 | Japan | Wheat | 1998 | 35111 | 1668.0 | 79821.18 | 16.45 |
| 17770 | 17770 | Lesotho | Potatoes | 1991 | 156250 | 788.0 | 0.30 | 14.66 |
| 21047 | 21047 | Nepal | Potatoes | 1992 | 85916 | 1500.0 | 60.11 | 14.95 |
| 18884 | 18884 | Mauritius | Cassava | 1992 | 140000 | 2041.0 | 687.86 | 23.81 |
| 12211 | 12211 | India | Soybeans | 1999 | 11380 | 1083.0 | 46195.00 | 25.97 |
| 5331 | 5331 | Burundi | Maize | 2012 | 11763 | 1274.0 | 98.59 | 21.44 |
| 13602 | 13602 | India | Sorghum | 2008 | 10209 | 1083.0 | 14485.33 | 26.83 |
| 14586 | 14586 | Indonesia | Cassava | 1991 | 120946 | 2702.0 | 3259.00 | 25.27 |
In [6]:
yield_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 28242 entries, 0 to 28241 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 28242 non-null int64 1 Area 28242 non-null object 2 Item 28242 non-null object 3 Year 28242 non-null int64 4 hg/ha_yield 28242 non-null int64 5 average_rain_fall_mm_per_year 28242 non-null float64 6 pesticides_tonnes 28242 non-null float64 7 avg_temp 28242 non-null float64 dtypes: float64(3), int64(3), object(2) memory usage: 1.7+ MB
Exploratory Data Analysis and Visualization (Interactive Charts) using Plotly¶
Topics:¶
Top ten Countries used the highest amount of pesticides in the year of 2016
Pesticide usage in the number one country.
Application of pesticides in Bangladesh from 1990 to 2016.
Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)
Pesticide application and the Yields of some dominant crops in Bangladesh.
No.4 Analysis of Rice (The main dominating crop in Bangladesh)
Uneven distribution of Yields and Pesticide application in some past Years
Some Basic Analysis.
1. Top ten Countries used the highest amount of pesticides in the year of 2016¶
In [7]:
ten_pesticides_country=pesticides[pesticides.Year==2016][['Area','Value']].sort_values(by='Value',ascending=False).head(10)
In [8]:
fig=px.bar(ten_pesticides_country,
y='Area',
x='Value',
title='Top ten countries which used the highest amount of pesticides in 2016',
color=ten_pesticides_country.Area)
fig.update_layout(
yaxis_title='Country',
xaxis_title='Tonnes of active ingredients',
width=1200,
height=700
)
fig.show()
China(Main land) used the highest amount of Pesticides in the year of 2016¶
2. Pesticides Application in China (Different regions) from 1990 to 2016¶
In [9]:
pesticides_china=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China, mainland', 'China, Taiwan Province of'])][['Area','Year','Value']]
In [10]:
fig=px.bar(
pesticides_china,
x='Year',
y='Value',
color='Area',
title='Pesticides Usage in China from 1990 to 2016',
)
fig.update_layout(
yaxis_title='Tonnes of active ingredients',
width=1200,
height=700
)
fig.show()
Mainland of China used the highest amount of Pesticides compared to other regions¶
2.1 Other regions of China¶
In [11]:
pest_china_other_area=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China'])]
In [12]:
fig=px.bar(
pest_china_other_area,
x='Year',
y='Value',
color='Area',
title='Pesticides usage in China (Hong Kong SAR, Macao SAR)'
)
fig.update_layout(
yaxis_title='Tonnes of Active Ingredients',
width=1200,
height=700
)
fig.show()
Here Macao SAR used more pesticide than Hong Kong SAR over the past years¶
3. Application of pesticides and Crops Yield in Bangladesh from 1990 to 2013.¶
In [13]:
yield_df=yield_df.drop(columns='Unnamed: 0',inplace=False,axis=1)
yield_df['hg/ha_yield']=yield_df['hg/ha_yield']/10000
yield_df=yield_df.rename(columns={'hg/ha_yield':'ton/ha_yield'})
yield_bd=yield_df[yield_df.Area=='Bangladesh']
In [14]:
yield_pest_bd=yield_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [15]:
fig=px.bar(
yield_pest_bd,
x=yield_pest_bd.index,
y='pesticides_tonnes',
color='ton/ha_yield',
title='Pesticide Application and Crops Yield from 1990 to 2013 in BD'
)
fig.update_layout(yaxis_title='Pesticide Dose (Tonnes)',
width=1200,
height=700)
fig.show()
The application of pesticides in BD has increased exponentially over the past years (From 1990 to 2013)¶
4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)¶
In [41]:
fig=px.scatter(yield_df.sample(2000),
x='pesticides_tonnes',
y='ton/ha_yield',
color='Item',
size='ton/ha_yield',
hover_data=['Item','Year'],
hover_name='Area',
title='Realtionship between Crops Yield and Pesticide usage'
)
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
In [17]:
correlation_AVG= yield_df[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_AVG)
ton/ha_yield pesticides_tonnes ton/ha_yield 1.000000 0.064085 pesticides_tonnes 0.064085 1.000000
Some countries use less amount of pesticides and get higher Crop yields and vice versa. For example Brazil uses a significant amount of pesticides, but tonne/ha yield is not that much compared to Switzerland, Germany, the United Kingdom, etc. also the correlation coefficient between ton/ha_yield and pesticides_tonnes is approximately 0.0641. Which is very close to 0, indicating a very weak positive linear relationship between pesticide usage and crop yield¶
4.1 Correlation Analysis In case of Bangladesh¶
In [18]:
fig=px.scatter(yield_bd,
x='pesticides_tonnes',
y='ton/ha_yield',
size='pesticides_tonnes',
color='Item',
hover_data='Year',
hover_name='Item',
title='Realtionship between Crops Yield and Pesticide usage (Bangladesh)')
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700)
fig.show()
In [19]:
correlation_BD= yield_bd[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_BD)
ton/ha_yield pesticides_tonnes ton/ha_yield 1.00000 0.14386 pesticides_tonnes 0.14386 1.00000
Same goes here, the correlation coefficient is 0.14386 Which is close to 0, indicating a weak positive linear relationship between pesticide usage and crop yield.¶
5. Pesticide application and the Yields of some dominant crops in Bangladesh.¶
In [20]:
fig=px.line(yield_bd,
x='pesticides_tonnes',
y=yield_bd['ton/ha_yield'],
hover_data=['Year'],
color='Item',
title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
Potatoes tonne/ha yield is very high compared to other major crops. There was some sort of uneven distribution of pesticides and the yield of crops from 2008 to 2012¶
6. Analysis for Rice (The main dominating crop in Bangladesh)¶
In [21]:
yield_bd_rice=yield_bd[yield_bd.Item=='Rice, paddy']
In [22]:
fig=px.line(yield_bd_rice,
x='pesticides_tonnes',
y='ton/ha_yield',
title='Effects of pesticides usage on Rice yield in Bangladesh from 1990 to 2013',
hover_data=['Year']
)
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
7. Uneven distribution of Crop Yields and Pesticide application in some of the past Years (2008 to 20012)¶
In [23]:
yield_8_9_10_11_12=yield_df[yield_df.Year.isin(range(2008,2013))]
yield_8_9_10_11_12_bd=yield_8_9_10_11_12[yield_8_9_10_11_12.Area=='Bangladesh']
yield_pests_8to12=yield_8_9_10_11_12_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [24]:
fig=px.bar(yield_pests_8to12,y='pesticides_tonnes',
x=yield_pests_8to12.index,
title='Pesticides usage and yield of crops from 2008 to 20012',
color='ton/ha_yield'
)
fig.update_layout(
yaxis_title='Pesticides Dose (Tonnes)',
xaxis_title='Year',
width=1200,
height=700
)
fig.show()
Both the application of pesticide and yield of crops were unevenly distributed. There could be some problem with the Data entry in those years¶
7.1. Distribution in some of the previous years (2000 to 2007)¶
In [25]:
yield_2000to2007=yield_df[yield_df.Year.isin([2000,2001,2002,2004,2005,2006,2007])]
yield_2000to2007bd=yield_2000to2007[yield_2000to2007.Area=='Bangladesh']
yield_2000to2007bd=yield_2000to2007bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [26]:
fig=px.bar(yield_2000to2007bd,y='pesticides_tonnes',
x=yield_2000to2007bd.index,
title='Pesticides usage from 2000 to 2007 in BD',
color='ton/ha_yield'
)
fig.update_layout(
yaxis_title='Pesticides dose (Tonnes)',
xaxis_title='Year',
width=1200,
height=700
)
However distribution ascended gradually from 2000 to 2007¶
8. Some Basic analysis¶
8.1 Top ten maize Producing Countries in 2013¶
In [27]:
crop_count=yield_df.groupby('Area')['ton/ha_yield'].sum()
maize_df=yield_df[yield_df.Item=='Maize'][['Area','Year','Item','ton/ha_yield']].sort_values(by='ton/ha_yield',ascending=False)
maize_df_13=maize_df[maize_df.Year==2013]
top_ten_maize_df_13=maize_df_13.groupby('Area')['ton/ha_yield'].sum().sort_values(ascending=False).head(10)
In [28]:
fig=px.bar(
top_ten_maize_df_13,
x=top_ten_maize_df_13.index,
y=top_ten_maize_df_13,
title='Top ten Maize producing countries in 2013',
color=top_ten_maize_df_13.index
)
fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()
8.2 Top ten Rice Producing Countries in 2013¶
In [29]:
rice_df=yield_df[yield_df.Item=='Rice, paddy']
rice_df=rice_df[rice_df.Year==2013]
rice_df
top_ten_rice_df_13=rice_df.groupby('Area')['ton/ha_yield'].sum()
top_ten_rice_df_13=top_ten_rice_df_13.sort_values(ascending=False)
top_ten_rice_df_13=top_ten_rice_df_13.head(10)
In [30]:
fig=px.bar(
top_ten_rice_df_13,
x=top_ten_rice_df_13.index,
y=top_ten_rice_df_13,
title='Top ten Rice producing countries in 2013',
color=top_ten_rice_df_13.index
)
fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()
8.3 Ten highest crops (Maize, Potatoes, Rice, Sorghum, Soybeans, Wheat, Cassava, Sweet potatoes, Plantains and others, Yams) yielding countries from 1990 to 2013¶
In [31]:
highest_crops_yield=yield_df.groupby('Area')['ton/ha_yield'].sum()
highest_crops_yield=highest_crops_yield.sort_values(ascending=False)
ten_highest_crops_yield=highest_crops_yield.head(10)
In [32]:
fig=px.bar(
ten_highest_crops_yield,
x=ten_highest_crops_yield.index,
y=ten_highest_crops_yield,
title='Ten highest crop yielding Countries (1990 to 2013)',
color=ten_highest_crops_yield.index
)
fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()